EN FR
EN FR


Section: Scientific Foundations

Optimal Decision Making under Uncertainty

Participants : Olivier Teytaud [correspondent] , Jean-Joseph Christophe, Adrien Couëtoux, Hassen Doghmen, Jérémie Decock, Nicolas Galichet, Manuel Loth, Marc Schoenauer, Michèle Sebag.

The UCT-SIG works on sequential optimization problems, where a decision has to be made at each time step along a finite time horizon, and the underlying problem involves uncertainties along an either adversarial or stochastic setting.

Application domains include energy management at various time scales and more generally planning, on the one hand, and games (Go, MineSweeper, NoGo [12] ) on the other hand.

The main advances done this year include:

  • In the domain of computer Go some new performances have been realized [16] and survey papers have been published in Communications of the ACM [10] and as a chapter [65] .

  • The extension of Upper Confidence Trees to continuous or large domains (states and/or actions) and to domains with high expertise or strong structure has been done [37] , [31] , [38] .

  • The extension of Upper Confidence Trees to multi-objective settings has been done, improving on scalarization-based multi-objective reinforcement learning [56] .

  • Anytime algorithms for discrete time control (note that the classical stochastic dynamic programming is by no means anytime) have been developed and integrated in the Metis software (section 5.1 ), based on the above cited results.

  • Hybrid approaches combining upper confidence trees and e.g. direct policy search or domain specific approaches to yield robust performance w.r.t. long-term effects and take advantage of the combinatorial structure of the domain have been designed, specifically including problem-specific expertise in the playout phase in the domain of job-shop scheduling [53] or MineSweeper [54] .

  • Optimization algorithms for direct policy search have been designed [55] .

  • Within the European STREP Mash project, our favorite tools (in particular Monte-Carlo Tree Search) have been extended to difficult settings with no possibility to “undo” a decision [63] . The notion of “risk of exploration” has been investigated [59] .

  • An experimental analysis of bandit algorithms for small budget cases [36] got the excellent paper award at TAAI 2012.

  • In collaboration with Christian Shulte (KTH, Stockholm), one of the main contributors to the well-known general-purpose CP solver GECODE (http://www.gecode.org/ ), and within the Microsoft-Inria joint lab- Adapt project, ideas from UCT have been integrated in GECODE and applied to the job-shop scheduling problem with good first results [52] .

  • The optimization of low-discrepancy sequences has been done, improving on the best results so far [7] ; note that low-discrepancy sequences have been exploited in quite a few of our past works.